Factored Neural Machine Translation

نویسندگان

  • Mercedes García-Martínez
  • Loïc Barrault
  • Fethi Bougares
چکیده

We present a new approach for neural machine translation (NMT) using the morphological and grammatical decomposition of the words (factors) in the output side of the neural network. This architecture addresses two main problems occurring in MT, namely dealing with a large target language vocabulary and the out of vocabulary (OOV) words. By the means of factors, we are able to handle larger vocabulary and reduce the training time (for systems with equivalent target language vocabulary size). In addition, we can produce new words that are not in the vocabulary. We use a morphological analyser to get a factored representation of each word (lemmas, Part of Speech tag, tense, person, gender and number). We have extended the NMT approach with attention mechanism (Bahdanau et al., 2014) in order to have two different outputs , one for the lemmas and the other for the rest of the factors. The final translation is built using some a priori linguistic information. We compare our extension with a word-based NMT system. The experiments, performed on the IWSLT’15 dataset translating from English to French, show that while the performance do not always increase, the system can manage a much larger vocabulary and consistently reduce the OOV rate. We observe up to 2% BLEU point improvement in a simulated out of domain translation setup.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of English-Persian Translation of Neural Google Translation

Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...

متن کامل

Pre-Reordering for Neural Machine Translation: Helpful or Harmful?

Pre-reordering, a preprocessing to make the source-side word orders close to those of the target side, has been proven very helpful for statistical machine translation (SMT) in improving translation quality. However, is it the case in neural machine translation (NMT)? In this paper, we firstly investigate the impact of pre-reordered source-side data onNMT, and then propose to incorporate featur...

متن کامل

Word Representations in Factored Neural Machine Translation

Translation into a morphologically rich language requires a large output vocabulary to model various morphological phenomena, which is a challenge for neural machine translation architectures. To address this issue, the present paper investigates the impact of having two output factors with a system able to generate separately two distinct representations of the target words. Within this framew...

متن کامل

LIUM Machine Translation Systems for WMT17 News Translation Task

This paper describes LIUM submissions to WMT17 News Translation Task for English↔German, English↔Turkish, English→Czech and English→Latvian language pairs. We train BPE-based attentive Neural Machine Translation systems with and without factored outputs using the open source nmtpy framework. Competitive scores were obtained by ensembling various systems and exploiting the availability of target...

متن کامل

Statistical Machine Translation with Factored Translation Model: MWEs, Separation of Affixes, and Others

This paper discusses Statistical Machine Translation when the target side is morphologically richer language. This paper intends to discuss the issues which are not covered by a factored translation model of Moses especially targetting EN–JP translation: the effect of MultiWord Expressions, the separation of affixes, and other monolingual morphological issues. We intend to discuss these over a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1609.04621  شماره 

صفحات  -

تاریخ انتشار 2016